Automatic Speech Recognition in Adverse Acoustic Conditions
نویسندگان
چکیده
Automatic Speech Recognition (ASR) technology has reached maturity to the extent that it can be used successfully in various applications. However, it is by no means the “solved problem ” that some marketing campaigns are promoting it to be. One o f the biggest challenges that operational ASR systems are faced with, is to maintain recognition performance in adverse acoustic conditions. The training procedures o f most ASR systems yield recognisers with a relatively rigid image o f the world: Only those acous tic variations that actually occurred in the training data are accounted for. Since training data is usually clean (in the sense that care is taken to avoid noisy recording environments, channel noise, etc.), noise sources which are present when the system is operational result in a mismatch between the training and the test conditions. Such a mismatch may reduce recognition performance quite significantly. The aim of this research is to determine the extent to which the robustness o f ASR systems against mismatched training and test conditions may be increased using acoustic backing-off as an im plementation o f Missing Feature Theory.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملA comparison of LPC and FFT-based acoustic features for noise robust ASR
Within the context of robust acoustic features for automatic speech recognition (ASR), we evaluated mel-frequency cepstral coefficients (MFCCs) derived from two spectral representation techniques, i.e. the fast Fourier transform (FFT) and linear pre dictive coding (LPC). ASR systems based on the two feature types were tested on a digit recognition task using continuous density hidden Markov ph...
متن کاملA Hybrid Method for Automatic Speech Recognition Performance Improvement in Real World Noisy Environment
It is a well known fact that, speech recognition systems perform well when the system is used in conditions similar to the one used to train the acoustic models. However, mismatches degrade the performance. In adverse environment, it is very difficult to predict the category of noise in advance in case of real world environmental noise and difficult to achieve environmental robustness. After do...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999